Paper review
Zuse Institute Berlin
The authors propose a bridge between two disciplines: - Complexity systems science (CSS),
Tip
“Intelligence refers to the general ability to achieve a diverse set of goals”
The authors identify that the main focus of CSS is understanding the emergence of macro-level behavior from simple micro-level processes. Mostly qualitative, conceptual models.
Usually in low-dimensional environments
High emphasis in mechanism simplicity
Computationally lightweight
Cooperation is usually “baked in” the model. An explicit model component is added to capture it.
Usual toolset
Main focus: Improve the cooperation mechanics (then study them)
Usually very high-dimensional state spaces
Cooperation is typically not given as an “RL action”, it emerges as a learned behavior
Very computationally expensive
Usual toolset:
Inherent stochasticity & large number of parameters makes interpretation hard
Important
“MARL simulations (by themselves) do not facilitate analytically reliable insights into how collective cooperation emerges from complex human and machine behavior in dynamic environments.”
For the time being we skip all technicalities and understand the main goal first
Definition
Collective Reinforcement Learning Dynamics (CRLD): Uses techniques from nonlinear dynamical systems to model the emergence of cooperative intelligence in a MARL environment.
Focus on low-dimensional environments, unlike MARL
Simplifies the stochastic nature & the computationally demanding RL algorithms by turning them into (deterministic) differential equations
The CSS techniques and approaches are what allow CRLD to bring more insight to MARL. What the CSS bring to the table
Complex phenomena: CRLD has the power to uncover the emergent behavior at a much lighter computational load
Multistability: CRLD can characterize entire dynamics, like
Critical transitions: robust way to find hysteretic behavior.
Collective memory: Hysterisis observed in MARL is a form of collective memory
Now we focus on what the MARL techniques can offer to enrich the CSS approach
Cooperation from individual cognition: CSS models the spread of successful strategies as copying. This assumes all agents share success criteria. MARL allows for more realism (e.g. intrinsic motivations, homestasis)
Cooperation in large collectives: CSS must simplify individuals into a homogenous category. The CRLD approach allows a collective of heterogenous individuals to be treated as a dynamical system. It extends the “mean-field approaches” in MARL
Cooperation in dynamic environments: MARL naturally works with dynamic, uncertain, partially-observable environments, which is significantly more challenging in CSS.
Note
The proposed bridge built by CRLD aims to bring the comparative advantages of CSS and MARL together by developing a shared mathematical framework. The comparative advantages each have, make them complement one another.
But what is cooperation? what are agents cooperating for? what is learning behavior?
Definition
The joint strategy, denoted \(X_{t}^i\), is the probability of each agent \(i\) choosing action \(a\) in state \(s\). This is akin to the dynamics function in standard RL parlance.
By learning behavior, we mean the dynamics of the joint strategy update.
Two agents play a public goods game
There is an immediate incentive to exploit, but the mutual optimal es cooperation
The dynamic environments: consisting of two states, a prosperous and a degraded one
Each defecting agent increases the probability of collapse
When collapsing to the degraded state, each agent suffers from the collapse by a negative impact until the prosperous state is re-established.
The ecological tipping environment exhibits hysteresis
Multistability in CRLD applied to the ecological tipping environment. The dynamics equations used are based on temporal-difference learning \[ x_{t+1}^i (s,a) = \frac{1}{\xi_{X_t}^i (s)} X_{t}^i(s, a) \exp(\eta^i \cdot \delta_{x_t}^i(s, a)) \]
CRLD tries to leverage the realism of MARL, and the analytical tooling from CSS to understand emergent behaviour in multi-agent systems
CRLD can capture phenomena such as multistability, critical transitions, and collective memory
The paper reads as half review, half position paper. The call to action is to try to apply CRLD to as many things as possible
Paper review